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Abstract 

Much  human  learning  appears  to  be  gradual  and  unconscious,  suggesting  a  very  limited  form  of  search 
through  the  space  of  hypotheses.  We  propose  hill  climbing  as  a  framework  for  such  learning  and  consider 
a  number  of  systems  that  learn  in  this  manner.  We  focus  on  CLASSIT,  a  model  of  concept  formation 
that  incrementally  acquires  a  conceptual  hierarchy,  and  MAGGIE,  a  model  of  skill  improvement 
that  alters  motor  schemas  in  response  to  errors.  Both  models  integrate  the  processes  of  learning  and 
performance.*'  *  •  *••••  -  ^  , 


1.  Introduction 

Search  has  proved  to  be  a  powerful  metaphor  for  understanding  the  nature  of 
learning  (Mitchell,  1982;  Langley  &  Carbonell,  in  press).  Describing  a  learning  system 
in  terms  of  its  states,  operators,  and  evaluation  criteria  has  led  to  insights  into  learning 
tasks  themselves  and  into  relations  between  different  learning  methods.  However, 
much  of  the  search-based  work  on  empirical  (inductive)  learning  methods  has  relied 
on  methods  like  depth-first  search,  breadth-first  search,  and  beam  search.  Although 
these  may  be  useful  for  applied  learning  systems,  they  seem  implausible  as  models  of 
human  learning. 

1.1  Hill  Climbing  as  a  Metaphor  for  Learning 

In  many  domains,  human  learning  seems  to  occur  in  a  gradual,  unconscious  fash¬ 
ion.  Obvious  examples  of  this  mode  include  concept  formation,  grammar  acquisition, 
and  motor  learning.  But  even  complex  belief  structures  -  such  as  those  occurring  in 
scientific  theories  -  may  gradually  evolve  in  this  manner.  We  will  argue  that  psycho¬ 
logical  theories  of  such  learning  should  be  constrained  along  three  dimensions: 

•  learning  must  be  incremental;  there  should  be  no  extensive  reprocessing  of  previ¬ 
ously  encountered  instances; 

•  the  learner  can  entertain  only  one  ‘hypothesis’  at  a  time;  i.e.,  competing  alterna¬ 
tives  are  not  retained; 

•  The  learner  has  no  memory  of  previous  hypotheses  that  it  lias  held;  thus,  there 
can  be  no  direct  backtracking. 

Taken  together,  these  constraints  rule  out  nearly  all  forms  of  search.  However,  there 
is  one  very  weak  search  framework  -  hill  climbing  -  with  the  requisite  characteristics. 
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In  this  paradigm,  one  begins  with  some  initial  structure  in  memory,  often  degen¬ 
erate  in  form  (e.g.,  an  empty  decision  tree).  Given  a  new  instance,  the  learner  can 
modify  the  current  structure  in  a  variety  of  ways,  and  each  choice  constitutes  a  step 
through  the  space  of  structures.  In  order  to  select  between  the  alternatives,  the  learner 
invokes  an  evaluation  function,  selecting  that  structure  with  the  best  score.  The  pre¬ 
vious  state  of  memory  is  forgotten,  along  with  the  alternative  structures  that  were 
not  selected.1  This  process  continues  as  long  as  new  instances  are  encountered.  In 
some  cases,  a  constrained  state  generator  replaces  the  evaluation  function,  producing 
the  new  state  deterministically  from  the  current  state  and  the  instance. 

This  algorithm  differs  from  standard  hill-climbing  methods  in  that  steps  are  taken 
only  as  instances  arrive,  but  the  basic  structure  is  the  same.  Thus  it  is  subject  to  the 
same  limitations,  such  as  the  tendency  to  halt  at  a  local  optimum.  However,  one  does 
not  require  optimal  behavior  in  models  of  human  learning;  one  only  requires  them  to 
mimic  human  behavior.  Simon  (1969)  has  argued  that  in  complex  domains,  humans 
tend  to  satisfice.  In  this  light,  the  limits  of  hill-climbing  methods  may  be  an  asset. 

1.2  Earlier  Work  in  the  Hill-Climbing  Paradigm 

Until  the  resurgence  of  machine  learning  research  in  the  late  1970’s,  hill-climbing 
approaches  to  learning  were  reasonably  common.  For  instance,  the  ‘parameter  tuning’ 
method  used  in  Samuel’s  (1963)  checker  player  employed  a  form  of  hill  climbing,  and 
the  incremental  learning  schemes  used  in  neural  networks  can  also  be  viewed  in  this 
light.  Both  classes  of  algorithm  step  through  a  space  of  numeric  parameters,  with 
the  direction  and  amount  of  motion  controlled  by  the  most  recent  instance.  There 
is  no  memory  for  alternative  or  previous  states,  but  the  states  themselves  are  quite 
complex,  consisting  of  many  terms/links  and  their  associated  weights. 

Winston’s  (1975)  early  work  on  learning  from  examples  provides  another  instance 
of  the  hill-climbing  paradigm.  In  this  case,  each  state  consisted  of  a  complex  struc¬ 
tural  definition  of  the  goad  concept,  with  operators  for  specializing  and  generalizing 
this  structure.  As  with  Samuel’s  system  and  the  work  on  neural  nets,  there  was  no 
explicit  evaluation  function,  but  given  a  new  instance  the  system  selected  a  single  re¬ 
sponse.  Again,  there  was  no  memory  for  previous  concept  descriptions,  so  no  explicit 
backtracking  could  occur.  However,  the  presence  of  inverse  operators  (generalization 
could  undo  specialization  and  vice  versa)  could  produce  a  backtrack-like  effect  in 
certaiin  cases. 

Research  on  grammar  acquisition  has  also  employed  the  hill-climbing  metaphor. 
The  best  example  is  Wolff’s  (1982)  SNPR  system,  which  induced  a  phrase-structure 
grammar  from  sample  strings.  This  program  included  operators  for  defining  both 
chunks  (words  and  phrases)  and  clusters  (word  classes).  SNPR  incorporated  an  eval¬ 
uation  function  that  measured  the  tradeoff  between  a  grammar’s  simplicity  and  its 
‘compression’  of  the  data.  At  each  stage  in  its  processing,  the  system  defined  the 

’Note  that  we  have  placed  no  restrictions  on  the  complexity  of  the  memory  structures,  the  sophis¬ 
tication  of  the  evaluation  function,  the  power  of  the  state  generator,  or  whether  instances  are  stored. 
The  only  limits  involve  memory  for  alternative  states  and  the  manner  in  which  instances  are  used. 
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chunk  or  cluster  that  led  to  the  best  value  on  this  criterion.  Wolff’s  system  was  only 
semi-incremental,  processing  a  number  of  strings  to  compute  the  scores  for  competing 
grammars.  These  grammars  could  become  quite  complex  but,  like  the  other  systems 
we  have  examined,  it  stored  only  one  such  structure  in  memory  at  a  time. 

We  can  contrast  hill-climbing  theories  of  learning  with  methods  that  incorporate 
more  memory-intensive  search  schemes.  For  instance,  Mitchell  (1982)  describes  a 
depth-first  search  algorithm  for  learning  from  examples  that  remembers  both  instances 
and  previous  states.  He  also  describes  the  version  space  algorithm,  which  carries  out 
a  breadth-first  search  through  the  space  of  concept  descriptions  by  maintaining  a 
frontier  of  hypotheses.  Michalski  (1983)  describes  another  algorithm  that  employs  a 
beam  search;  this  uses  an  evaluation  function,  but  it  differs  from  hill-climbing  methods 
in  maintaining  the  N  best  states  at  each  level  of  the  search.  Some  strength-based 
methods,  such  as  those  proposed  by  Holland  (1986)  and  Langley  (1987),  come  closer 
to  the  hill-climbing  metaphor,  but  these  retain  competing  hypotheses  in  memory. 

In  the  remainder  of  the  paper,  we  will  present  two  models  of  learning  based  on  the 
hill-climbing  analogy,  both  drawn  from  the  UCI  branch  of  the  World  Modelers  Project 
(Carbonell  &  Hood,  1986;  Langley,  1986).  The  first  involves  the  task  of  incremental 
concept  formation,  in  which  the  learner  must  construct  a  concept  hierarchy  for  objects 
it  encounters  in  the  environment.  The  second  addresses  the  task  of  improving  motor 
skills  with  practice.  We  close  with  some  other  instances  of  hill-climbing  systems  that 
operate  in  more  symbolic  domains. 

2.  A  Model  of  Incremental  Concept  Formation 

Much  of  the  AI  research  on  concept  learning  has  occurred  within  the  ‘learning 
from  examples’  framework,  in  which  a  tutor  presents  positive  and  negative  instances 
of  goal  concepts  at  a  single  level  of  abstraction.  Yet  we  know  that  a  human  can 
acquire  concepts  in  the  absence  of  a  tutor,  and  human  memory  appears  to  have  a 
complex  hierarchical  organization.  In  recent  years,  research  in  conceptual  clustering 
(Michalski  &  Stepp,  1983;  Fisher  &  Langley,  1985)  has  responded  to  both  these 
issues.  However,  most  of  this  work  has  assumed  that  learning  is  nonincremental  and 
that  concepts  are  represented  as  necessary  and  sufficient  conditions,  neither  of  which 
hold  for  human  concept  formation.  In  this  section  we  present  CLASSIT,  a  model  that 
acquires  hierarchies  of  ‘fuzzy’  concepts  using  an  incremental  algorithm. 

In  the  following  pages  we  describe  the  system  in  terms  of  its  representation  of 
data  and  concepts,  its  mechanisms  for  classification  and  learning,  and  the  evaluation 
function  it  employs  to  direct  search  through  the  space  of  concept  hierarchies.  The 
model  borrows  from  earlier  concept  formation  systems,  including  Feigenbaum’s  EpaM 
(1963),  Lebowitz’s  UNIMEM  (1986),  and  especially  Fisher’s  COBWEB  (in  press).3  Like 
its  three  predecessors,  CLASSIT  can  be  viewed  as  a  hill-climbing  learning  system. 

2 We  should  note  that  Classit’s  learning  algorithm  is  identical  to  that  used  in  Fisher's  Cobweb, 
and  that  the  two  systems  differ  only  in  their  representations  and  evaluation  functions.  Many  of  our  ideas 
on  hill-climbing  approaches  to  learning  emerged  from  discussions  with  Doug  Fisher  about  Cobw  eb. 
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2.1  Representing  Objects  and  Object  Concepts 

The  World  Modelers  Project  is  concerned  with  learning  in  a  reactive,  physi¬ 
cal  environment.  Thus,  CLASSIT  accepts  input  consisting  of  descriptions  for  three- 
dimensional  physical  objects.  Each  instance  is  specified  as  a  set  of  cylinders  having  a 
length,  radius,  location,  and  orientation.  Marr  (1982)  has  argued  that  such  descrip¬ 
tions  constitute  plausible  output  from  the  human  vision  system.  This  representation 
is  heavily  numeric  and  differs  considerably  from  the  more  abstract  semantic  network 
and  predicate  calculus  representations  used  by  Winston  (1975)  and  others. 

For  instance,  our  model  represents  a  particular  animal  (say  a  cat)  as  a  set  of  eight 
cylinders  -  representing  the  head,  neck,  torso,  tail,  and  four  legs.  The  size,  shape, 
and  orientation  of  a  given  animal  are  represented  by  72  real-valued  attribute-value 
pairs,  with  nine  attributes  for  each  cylinder.  The  concept  for  a  cat  (as  distinct  from 
a  particular  cat)  is  represented  using  the  same  attributes,  but  specifying  the  mean 
and  variance  for  each  attribute  instead  of  a  particular  value.  Some  attributes  will 
vary  considerably,  while  others  will  be  nearly  constant;  the  latter  can  be  viewed  as 
more  central  (or  criterial)  to  the  concept  than  the  former.  Thus,  both  instances  and 
concept  descriptions  are  closely  linked  to  the  the  sensory  level. 

2.2  Classification  and  Learning 

In  CLASSIT,  the  processes  of  classification  and  learning  are  intertwined;  one  cannot 
occur  without  the  other.  Concepts  are  organized  into  a  concept  hierarchy,  with  more 
general  concepts  on  top  and  their  more  specific  children  below.  Each  time  the  system 
encounters  a  new  instance,  it  sorts  that  instance  down  the  concept  hierarchy.  At  each 
level,  it  decides  whether  to  place  the  instance  into  an  existing  class  or  whether  to 
create  an  entirely  new  (disjunctive)  class.  In  the  former  case,  the  attribute-values  of 
the  new  instance  are  ‘averaged  into’  the  existing  means  and  variances;  this  changes 
the  ‘definition’  of  the  class.  The  instance  is  then  compared  to  the  children  of  this 
class  and  the  process  is  applied  recursively.  If  a  new  class  is  created,  the  values  of  the 
instance  become  the  initial  means  of  that  class.  Such  a  decision  actually  changes  the 
structure  of  the  concept  hierarchy. 

The  model  also  includes  operators  for  merging  and  splitting  classes;  these  provide 
some  ability  to  recover  from  poor  hierarchies  that  may  result  from  non-representative 
experiences  early  in  the  learning  process.  This  gives  a  backtracking-like  effect  without 
the  memory  overhead  of  that  mechanism.  In  summary,  the  system  is  incremental;  it 
retains  only  one  ‘hypothesis’  at  each  point  in  its  evolution,  and  it  has  no  memory  of  its 
earlier  stages.  The  states  themselves  are  quite  complex,  consisting  of  an  entire  hierar¬ 
chy  of  complex  concept  descriptions.  This  complexity  makes  the  notion  of  retaining 
multiple  states  seem  implausible,  and  thus  lends  plausibility  to  the  hill-climbing  ap¬ 
proach  we  have  taken  in  Cl.ASSIT. 
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2.3  CLASSIT’s  Evaluation  Function 

Most  clustering  systems  attempt  to  maximize  some  tradeoff  between  within-class 
similarity  and  between-class  differences.  In  a  similar  spirit,  CLASSIT  computes  both 
the  within-class  variance  W  and  the  between-class  variance  B  for  each  attribute  in  a 
potential  class.  These  terms  can  be  stated: 


W  = 


N  -J  +  1 


%  nj(xJ  ~XY 
and  B  =  — - - - 


where  J  is  the  number  of  classes,  N  is  the  number  of  instances,  n;-  is  the  number  of 
instances  in  class  j ,  Xj  is  the  average  value  of  the  attribute  for  class  j ,  and  x  is  the 
average  over  all  classes  in  the  partition  (a  set  of  disjoint  classes).  The  first  measure 
corresponds  to  an  attribute's  predictability  (how  well  it  is  predicted  by  membership 
in  the  class),  whereas  the  second  measure  corresponds  to  an  attribute’s  predictiveness 
(how  well  the  attribute  predicts  membership  in  a  class). 

CLASSIT’s  evaluation  function  -  which  we  call  category  quality  -  takes  both  of 
these  terms  into  account,  summing  over  all  K  attributes:3 


category  quality  =  — 

*= i  Wk 

This  measure  lets  CLASSIT  find  clusterings  of  instances  that  maximize  within-class 
similarities  and  that  minimize  between-class  differences.  Note  that  the  variance  W  for 
a  class  incorporates  the  number  of  instances  in  that  group.  Retaining  this  number  lets 
the  model  incrementally  update  its  means  and  variances  (and  thus  category  quality) 
as  it  observes  new  instances. 

CLASSIT  uses  the  category  quality  metric  to  determine  which  action  to  take  at 
each  level  in  the  hierarchy.  The  system  considers  placing  the  new  instance  in  each  of 
the  existing  classes  and  computes  the  resulting  score.  Next  it  compares  the  best  of 
these  values  to  the  score  that  would  result  from  creating  a  new  class  containing  only 
that  instance.  The  program  then  forms  that  partition  with  the  best  score,  generating 
a  new  ‘state.’  CLASSIT  also  uses  this  measure  to  determine  when  to  combine  and 
decompose  concepts;  Fisher  (in  press)  provides  the  details  of  this  process. 

2.4  Experimental  Results 

We  have  evaluated  CLASSIT’s  behavior  under  a  variety  of  conditions.  Figure  1 
summarizes  an  experiment  in  which  we  ‘defined’  four  classes  -  cats,  dogs,  horse,  and 
giraffes  -  with  different  amounts  of  variation.  The  column  labeled  ‘exact’  represents 
runs  in  which  all  members  of  a  class  were  identical,  giving  zero  within-class  variation. 

3lf  a  class  has  only  one  member,  then  its  variance  is  zero  and  division  by  H'  is  undefined.  To  avoid 
this  problem,  we  use  a  minimum  variance  for  each  attribute.  This  parameter  corresponds  to  the  notion 
of  a  ‘just  noticeable  difference’  in  psychophysics. 
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Within-class  variation 

Figure  1.  Category  qualities  resulting  from  different  within-class  variation. 


The  ‘tight’,  ‘medium’  and  ‘loose’  conditions  introduced  successively  more  variance. 
The  scores  for  each  column  were  averaged  over  ten  executions,  each  with  30  instances 
that  were  randomly  generated  from  means  and  variances  for  each  category.  The 
heights  of  the  light  bars  indicate  the  average  category  quality  of  the  final  hierarchy 
after  CLASSIT  processed  the  30th  instance.4 

The  graph  shows  that  as  the  regularity  within  each  category  decreases,  the  cate¬ 
gory  quality  also  decreases.  It  is  difficult  to  determine  whether  CLASSIT  is  actually 
finding  the  optimal  clustering  in  each  case,  since  this  would  require  an  exhaustive 
search  of  the  clustering  space.  However,  we  have  noted  that  as  variation  increases, 
the  system’s  hierarchies  tend  to  diverge  from  the  ‘desired’  hierarchies  used  to  gener¬ 
ate  the  data,  typically  producing  more  than  the  four  ‘ideal’  top-level  categories.  This 
suggests  that  CLASSIT’s  behavior  degrades  as  the  within-class  variation  increases  and 
between-class  variance  decreases,  as  one  would  expect.  The  system  performs  well  in 
orderly  environments,  but  its  ability  falls  off  as  more  variation  occurs. 

This  version  of  CLASSIT  retains  all  instances  it  has  ever  encountered,  storing  them 
as  terminal  nodes  in  the  concept  hierarchy.  Although  this  does  not  conflict  with  our 
hill-climbing  philosophy,  it  does  clash  with  our  intuitions  about  human  long-term 
memory.  Thus,  we  have  also  tested  a  ‘memory-limited’  version  that  constrains  the 
depth  of  the  concept  hierarchy.  Naturally,  this  variant  loses  information  that  the 
unlimited-memory  version  retained,  and  this  limits  the  extent  to  which  the  program 
can  simulate  ‘backtracking’  by  combining  and  decomposing  existing  categories.  This 
in  turn  makes  the  program  more  sensitive  to  the  order  in  which  it  encounters  instances. 
The  heights  of  the  dark  bars  in  Figure  1  show  the  scores  that  result  when  CLASSIT 
retains  only  one  level  of  categories.  Except  for  the  ‘exact’  condition,  the  system’s 
behavior  clearly  degrades  as  memory  limitations  are  introduced.  However,  its  behavior 
still  serves  as  a  reasonable  approximation  of  the  original,  while  considerably  reducing 
the  load  on  memory. 

'Since  instances  were  created  with  a  random  number  generator,  different  runs  within  a  condition 
could  produce  quite  different  hierarchies.  This  was  our  reason  for  averaging  across  ten  executions. 
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3.  A  Model  of  Motor  Skill  Improvement 

Although  many  researchers  have  examined  procedural  learning,  there  has  been 
little  AI  work  on  the  improvement  of  motor  skills.  Our  concern  with  reactive  envi¬ 
ronments  led  us  to  implement  a  simulated  jointed  arm  and  to  use  this  in  modeling 
motor  behavior.  Below  we  describe  MAGGIE,  our  model  of  motor  skills  and  their  ac¬ 
quisition.  As  with  the  work  on  concept  formation,  we  have  tried  to  remain  consistent 
with  knowledge  of  human  behavior.  Again,  we  begin  with  representational  issues  and 
then  turn  to  problems  of  performance  and  learning.  Naturally,  the  latter  incorporates 
a  hill-climbing  approach. 

3.1  Two  Representations  for  Motor  Schemas 

Following  Schmidt  (1982),  we  will  use  the  term  motor  schema  to  refer  to  some 
stored  description  of  a  motor  skill.  More  precisely,  we  represent  a  schema  as  a  tempo¬ 
ral  sequence  of  points  (Xi,  X2, . . . ,  Xn),  where  each  point  describes  the  location  and 
velocity  for  the  joints  involved  in  the  schema.  Within  this  framework,  two  natural 
representations  suggest  themselves,  each  based  on  a  different  coordinate  system. 

The  first  scheme  uses  Cartesian  three-space  with  the  origin  at  the  base  (the  first 
joint)  of  the  arm.  We  will  call  this  a  viewer-centered  representation.  It  corresponds  to 
the  view  an  agent  receives  as  it  carries  out  the  skill.  We  assume  that  such  information 
is  available  from  the  sensory  system  during  execution  of  the  motor  schema.  Thus, 
this  framework  can  be  used  for  recognition  and  monitoring  purposes. 

An  alternative  representation  involves  joint-centered  descriptions,  in  which  each 
joint  has  its  own  spherical  coordinate  system.  The  coordinate  system  for  a  particular 
joint  is  defined  in  relation  to  the  joint  to  which  it  is  connected.  For  instance,  the 
coordinates  for  an  elbow  would  be  described  in  the  reference  frame  of  its  associated 
shoulder  joint.  Thus,  each  joint  has  a  coordinate  system  in  which  location  and  velocity 
are  represented  using  distance  from  the  origin,  an  angle  of  rotation  about  the  x-axis, 
and  an  angle  of  rotation  about  the  y-axis.  We  assume  this  form  of  information  is 
available  as  proprioceptive  feedback  during  execution;  this  representation  can  also  be 
used  to  actually  generate  motor  behavior. 

3.2  Generating  Motor  Programs 

We  will  assume  MAGGIE  has  somehow  acquired  a  viewer-centered  schema  that 
describes  some  desired  behavior.  The  first  step  in  carrying  out  this  skill  involves 
translating  the  viewer-centered  description  into  a  joint-centered  representation  that 
can  be  directly  executed.  We  will  not  consider  the  details  of  this  transformation 
process,  but  we  will  assume  that  it  is  serial  in  nature,  and  therefore  costly.  Transfor¬ 
mations  must  be  done  for  each  joint  in  a  serial  manner,  starting  with  the  base  joint 
and  considering  each  successive  joint  in  turn. 

However,  the  joint-centered  representation  specifies  only  selected  points  involved 
in  the  skill;  to  actually  generate  behavior,  one  must  have  the  desired  locations  and 
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velocities  for  every  joint  at  every  point  in  time.  We  will  use  the  term  motor  program 
to  refer  to  such  an  interpolated  schema.  Motor  programs  are  not  stored  in  memory; 
they  are  generated  in  real  time  as  the  skill  is  executed.  In  our  model,  the  agent 
interpolates  the  points  making  up  a  motor  program  by  generating  a  spline  for  each 
joint,  connecting  the  sparser  points  in  the  joint-centered  schema.  There  is  evidence 
that  humans  can  ‘set’  their  limbs  in  desired  positions  even  in  the  absence  of  feedback. 
Thus,  we  have  not  attempted  to  model  the  low  level  mechanisms  by  which  an  arm 
actually  moves;  it  simply  follows  the  specified  motor  program. 

The  interpolation  process  leads  to  smooth  curves  that  cross  the  specified  points  at 
the  desired  velocities.  However,  the  interpolated  locations  and  velocities  may  be  quite 
different  from  those  that  would  result  from  interpolating  the  viewer-centered  scheme. 
For  instance,  a  schema  for  moving  the  hand  in  a  straight  line  can  be  specified  in 
viewer-centered  coordinates  using  a  few  points,  and  splining  these  points  would  in  fact 
produce  straight  line  behavior.  However,  when  MAGGIE  translates  this  schema  into 
joint-centered  coordinates  and  uses  splining  to  generate  a  motor  program,  a  sequence 
of  arcs  result,  with  the  end  of  each  arc  corresponding  to  a  point  in  the  motor  schema. 

3.3  Recovering  from  Errors 

In  other  words,  translating  from  the  initial  representation  to  an  executable  one  can 
introduce  errors.  This  means  the  performance  system  must  be  able  to  monitor  its  own 
behavior  and  to  correct  errors  as  they  occur.  In  our  model,  this  is  done  by  generating 
a  ‘pseudo-motor  program’  by  splining  points  in  the  viewer-centered  representation  and 
comparing  these  to  the  actual  points  generated  as  the  motor  program  runs.  MAGGIE 
cannot  execute  the  pseudo-program,  but  it  does  specify  the  desired  position  at  each 
time  during  execution.  When  the  monitoring  process  notices  a  significant  difference 
(i.e.,  exceeds  a  threshold),  it  invokes  the  error  correction  process. 

This  mechanism  applies  a  ‘burst  of  force’  in  a  direction  that  will  reduce  the  size 
of  the  error.  The  correction  function  has  an  inverted  U  shape,  starting  with  minor 
alterations,  increasing  to  a  peak,  and  then  decreasing  to  zero  after  a  time.  If  the  error 
does  not  increase  or  decrease,  the  path  of  the  limb  will  return  to  the  desired  p?th 
after  the  correction  process  has  ended.  However,  whether  this  occurs  will  depend  on 
the  nature  of  the  movement.  If  the  error  had  been  increasing  when  it  was  detected, 
then  undercompensation  will  occur.  If  it  had  been  decreasing,  then  overcompensation 
will  cause  the  arm  to  overshoot  the  mark.  In  such  cases,  the  agent  must  reinvoke  the 
error  recovery  mechanism  a  number  of  times. 

3.4  Improving  Motor  Schemas 

Although  monitoring  and  error  correction  give  immediate  aid  in  carrying  out 
desired  behaviors,  learning  provides  a  longer-term  solution.  Although  the  viewer- 
centered  and  joint-centered  representations  lead  to  different  interpolated  behavior, 
one  scheme  can  be  made  to  approximate  the  other  by  adding  selected  points  to  the 
schema.  For  instance,  one  can  simulate  straight-line  behavior  with  a  joint-centered 
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schema  by  connecting  a  sequence  of  very  small  arcs.  Although  other  forms  of  learning 
are  possible,  in  this  paper  we  will  focus  on  learning  by  the  addition  of  points  to  the 
joint-centt-ed  description. 

We  have  seen  that  error  detection  invokes  the  error  recovery  process,  but  it  also 
serves  as  the  trigger  for  learning.  Whenever  the  path  of  a  joint  diverges  noticeably 
from  the  desired  path,  MAGGIE  attempts  to  add  another  point  to  its  joint-centered 
schema.  Learning  occurs  only  after  the  execution  has  been  completed,  with  the  loca¬ 
tion  and  velocity  of  the  added  point  being  based  on  the  largest  error  that  was  detected 
during  the  run.  Thus  larger  errors  are  reduced  before  smaller  ones,  giving  a  learning 
curve  roughly  similar  to  the  power  laws  observed  in  human  skill  acquisition. 

In  this  manner,  MAGGIE  gradually  transforms  its  initial,  sparsely  defined  motor 
schema  -  containing  only  a  few  points  -  into  a  more  detailed  schema  containing  many 
points.  This  incremental  process  continues  until  the  monitoring  can  no  longer  detect 
any  differences  or  until  the  addition  of  new  points  fails  to  improve  performance.  Of 
course,  some  behaviors  require  more  learning  than  others;  since  the  joint-centered  rep¬ 
resentation  describes  arc-like  motions  quite  well,  skills  involving  such  motions  require 
the  insertion  of  many  fewer  points. 

The  details  of  this  model  differ  radically  from  our  theory  of  concept  formation,  but 
note  that  the  overall  idea  is  the  same.  MAGGIE’S  schemas  begin  as  relatively  simple 
structures,  and  details  are  added  as  it  gains  experience  in  a  domain.  Our  model 
of  motor  learning  retains  no  memory  of  instances  or  previous  schemas,  nor  does  it 
maintain  competing  alternatives  in  memory.  Although  Maggie  uses  an  intelligent 
generator  in  place  of  an  evaluation  function,  it  meets  all  the  criteria  set  forth  at  the 
outset  and  constitutes  another  instance  of  a  hill-climbing  theory  of  learning. 

3.5  Experimental  Studies 

Our  model  is  independent  of  a  limb’s  dimensions  and  rotational  constraints,  but  we 
have  tested  the  system  using  a  two-jointed  arm  with  roughly  human  characteristics. 
This  includes  an  upper  arm  and  a  forearm,  the  first  rotating  at  a  shoulder  joint  and 
the  second  at  an  elbow  joint.  We  have  run  a  number  of  experimental  studies  with 
MAGGIE,  all  in  two  dimensions.  For  instance,  we  have  shown  that,  as  in  humans,  there 
is  a  tradeoff  between  the  speed  at  which  a  motor  skill  is  executed  and  its  accuracy. 
We  have  also  studied  the  relation  between  speed  of  execution  and  overcompensation 
effects.  However,  these  involve  the  performance  of  the  system,  and  our  focus  here  lies 
with  learning. 

Naturally,  we  would  expect  that  as  the  system  detects  errors  and  adds  new’  points 
to  its  joint-centered  schema,  its  errors  will  decrease  on  later  executions.  Figure  2  shows 
the  results  of  a  series  of  eight  runs  with  the  'straight  line’  schema,  indicating  that  the 
model’s  performance  gradually  improves  with  practice.  Figure  3  presents  another 
result  that  makes  intuitive  sense.  As  the  skill  level  improves,  the  tradeoff  between 
speed  and  accuracy  becomes  less  evident.  As  more  points  are  added  to  the  schema,  its 
behavior  comes  to  approximate  the  desired  behavior  even  without  monitoring.  This 
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means  that  MAGGIE  can  execute  the  schema  at  a  higher  speed  -  even  though  there 
are  fewer  chances  for  monitoring  -  without  seriously  diverging  from  the  target  path. 
This  simulates  the  gradual  transition  of  motor  skills  from  closed-loop  processing  to 
open-loop  mode,  in  which  feedback  is  unnecessary. 


Figure  2.  Error  as  a  function  of  practice. 
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Figure  3.  Speed  vs.  accuracy  after  1,  2, 
and  4  learning  trials. 


4.  Generality  of  the  Hill-Climbing  Metaphor 

Both  CLASSIT  and  MAGGIE  employ  low-level  sensory-motor  representations,  since 
'*  we  feel  such  representations  play  an  important  role  in  learning  about  complex  physical 

environments.  However,  the  hill-climbing  approach  is  not  limited  to  such  represen¬ 
tations.  For  instance,  Rose  and  Langley  (1986)  have  described  STAHLp,  a  model  of 
scientific  reasoning  that  incorporates  techniques  from  belief  revision  and  truth  mainte¬ 
nance.  The  system  operates  in  the  domain  of  chemistry,  accepting  chemical  reactions 
as  input  and  generating  componential  models  of  various  substances  as  output.  At 
each  point  in  its  search,  the  system  holds  a  set  of  beliefs  that  cover  the  known  data. 

Upon  finding  an  inconsistency  in  its  belief  structures,  STAHLp  invokes  an  assump¬ 
tion-based  reasoning  technique  that  identifies  the  problematic  premises  and  suggests 
changes  that  would  eliminate  the  inconsistency.  The  program  then  evaluates  each 
modification  in  terms  of  its  impact  on  the  belief  system,  selecting  the  revision  that 
causes  the  least  overall  change.  Despite  its  complex  reasoning  processes,  STAHLp  can 
be  viewed  as  a  hill-climbing  learner  in  the  sense  we  have  defined  the  term.  At  each 
point,  the  system  maintains  a  single  ‘state’  in  memory  -  its  entire  belief  system  -  and 
when  change  is  required,  it  selects  a  single  successor  state  from  a  set  of  alternatives. 
Once  the  new  belief  system  has  emerged,  the  program  has  no  memory  for  previous 
states  or  for  competing  belief  systems. 

It  seems  natural  to  associate  the  hill-climbing  metaphor  with  empirical  learning 
systems  like  CLASSIT  and  MAGGIE,  but  the  approach  can  also  be  used  within  an 
analytic  or  explanation-based  framework.  Given  a  positive  instance  of  some  concept 


Hill-Climbing  Theories  of  Learning 


11 


or  operator  application,  an  analytic  learning  system  constructs  some  explanation  for 
why  that  instance  satisfies  the  goal  concept.  Using  the  proof  tree  from  the  explanation, 
it  then  formulates  a  general  rule  that  can  be  used  in  future  cases. 

Although  the  second  step  in  this  process  (from  explanation  to  rule)  is  algorithmic, 
the  first  step  (constructing  an  explanation)  can  involve  considerable  search  and  can 
invoke  heuristic  techniques  to  evaluate  the  quality  of  competing  explanations.  If  the 
learner  selects  only  one  explanation  (or  even  a  few)  to  transform  into  rules,  then 
we  have  another  case  of  hill-climbing  learning.  At  each  step,  only  one  state  exists  in 
memory  -  the  set  of  rules  that  constitute  the  compiled  proofs  of  previous  explanations. 
There  is  no  memory  for  previous  states  to  support  backtracking,  nor  is  there  any 
memory  of  explanations  that  were  abandoned  in  favor  of  better  ones. 

In  summary,  the  hill-climbing  framework  extends  across  the  traditional  bound¬ 
aries  of  machine  learning.  It  can  be  applied  to  symbolic  or  sub-symbolic  represen¬ 
tations,  and  it  can  be  used  in  conjunction  with  weak  (empirical)  learning  methods 
or  knowledge-intensive  (analytic)  methods.  We  believe  that  many  aspects  of  human 
learning  operate  in  this  mode,  and  we  have  presented  evidence  -  through  ClaSSIT 
and  MAGGIE  -  that  viable  and  interesting  learning  can  occur  in  this  fashion.  We 
encourage  other  researchers  to  explicitly  adopt  the  hill-climbing  metaphor,  and  to 
explore  the  characteristics  of  this  constrained  but  promising  approach  to  learning. 
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